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Abstract 

A novel computationally efficient Markov chain Monte Carlo (MCMC) scheme for 
latent Gaussian models (LGMs) is proposed in this paper. The sampling scheme is a 
two block Gibbs sampling scheme designed to exploit the model structure of LGMs. 
We refer to the proposed sampling scheme as the MCMC split sampler. The principle 
idea behind the MCMC split sampler is to split the latent Gaussian parameters 
into two vectors. The former vector consists of latent parameters which appear 
in the data density function, while the latter vector consists of latent parameters 
which do not appear in it. The former vector is placed in the first block of the 
proposed sampling scheme and the latter vector is placed in the second block along 
with any potential hyperparameters. The resulting conditional posterior density 
functions within the blocks allow the MCMC split sampler to handle, by design, 
LGMs with latent models imposed on more than just the mean structure of the 
data density function. The MCMC split sampler is also designed to be applicable 
for any choice of a parametric data density function. Moreover, it scales well in 
terms of computational efficiency when the dimension of the latent model increase. 


1 Introduction 


Latent Gaussian models (LGMs) form a flexible subclass of Bayesian hierarchical models 
and have become popular in many areas of statistics and various fields of applications, as 
LGMs are practical from a statistical modeling point of view and readily interpretable. 
For example, LGMs play an important role in spatial statistics, see Gressie [1993 , Diggle 


et al. [1998 , Delfiner et al. [2009 ; statistical climatology [Gooley et al. 2007 Guttorp 
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and Gneiting 

2006 ; disease mapping fPettitt et al. 

2002 

Lawson ^ 

>013 ; 

volatility models Martino et al. 

, 2011|; and hydrology |Schaefli et al. 

2007 


stochastic 
, to name 

a few. Moreover, LGMs can be viewed as a specific extension of strnctnred additive 
regression models [Fahrmeir et al. 


1994 Rne et al. 2009] , in the sense that, the data 


density fnnction of each data point can depend on more than a single linear fnnctional 


of the latent field throngh more than jnst the mean strnctnre, as discnssed in Martins 
et al. [2013 


Althongh LGMs are well snited from a statistical modeling point of view their pos¬ 
terior inference becomes compntationally challenging when latent models are desired for 
more than jnst the mean strnctnre of the data density fnnction {Martins et al. 2013], 


when the nnmber of parameters associated with the latent model increase; or when the 
data density fnnction is non-Ganssian. The aim of this paper is to propose a novel 
compntationally efficient Markov chain Monte Garlo (MGMG) scheme which serves to 
address these compntational issnes. The proposed sampling scheme is referred to as the 
MCMC split sampler in this paper. It is designed to handle LGMs where latent models 
are imposed on more than jnst the mean strnctnre of the likelihood. It scales well in 
terms of compntational efficiency when the dimensions of the latent models increase and 
it is applicable for any choice of a parametric data density fnnction. The main novelty of 
the MGMG split sampler lies in how the model parameters of a LGM are split into two 
blocks. As a resnlt of the proposed blocking scheme, one of the blocks exploits the latent 
Ganssian strnctnre in a natnral way and becomes invariant of the data density fnnction. 

Markov chain Monte Garlo (MGMG) methods form the backbone of modern Bayesian 
posterior inference and are, in principle, applicable to almost any Bayesian model. How¬ 
ever, the mixing and convergence properties of the MGMG chains can be poor for involved 
models strnctnres and large data sets if parameters that are dependent in the posterior 
are not dealt with properly, see for example Mnrray and Adams 2010] . In particnlar, the 
mixing and convergence properties of the popnlar single site npdating strategy can be 
extremely poor dne to strong dependencies of parameters in the posterior distribntion as 
discnssed in Knorr-Held and Rne [2002 . Several MGMG sampling strategies have been 


snggested for Bayesian hierarchical models to improve the mixing properties of MGMG al¬ 
gorithms. For example, methods based on approximate diffnsions snch as the Metropolis 


Adjnsted Langevin algorithm (MALA), see 

Roberts and Rosenthal 

|1998|; 

on Hamiltonian mechanics (HMG), snggested by 

Neal 

|1993 , which nse 


the target density to drive the proposal mechanism toward regions of higher posterior 
density; manifold methods, proposed by Girolami and Galderhead {2011 , which provide 
a systematic way of designing proposal densities for MALA and HMG by making nse of 
the gradient and cnrvatnre information of the target density; and varions block sampling 
strategies snch as the one block npdating strategy of Knorr-Held and Rne |2002j. Filip 


pone et al. 2013 condncted a detailed comparison of these methods for LGMs and fonnd 


that the single block strategy of Knorr-Held and Rne [2002 , in which the latent field 
and its corresponding hyperparameters are npdated jointly in a single block, performed 
best in most sitnations. Fnrthermore, by nsing nnmerical methods for fast sampling of 
Ganssian Markov random fields (GMRFs) |Rne 2001], the single block sampler can be 


2 



























































































implemented with a low computational cost for LGMs. However, using only a single 
block sampler for LGMs with a non-Gaussian likelihood and high dimensional latent 
fields can be problematic, as parameters accepted in regions of low posterior probability 
can cause the MGMG chain to get stuck. 

Alternative to MGMG methods are deterministic approximate posterior inference 
methods, such as the Integrated nested Laplace approximation (INLA) [Rue et ah] |2009] . 


INLA is a fast approximate inference method for LGMs in which the data density of 
each data point only depends on a single linear functional of the latent field. While 
this assumption holds in many practical cases, there are many models in which we want 
the latent field to enter the data density of a single observation through two or more 

and Martins et al. for further discussion. For example 


parameters, see 

Kneib 

to 
o 
1—‘ 
CO 

in 

Hrafnkelsson et al. 

|2012 


and Geirsson et al. [2015 , latent Gaussian spatial models 


were imposed on the location, scale and shape structure of the data density function. 
The MGMG split sampler is a two block Gibbs sampling scheme [Geman and Geman 


1984 Gasella and George 1992 designed for LGMs, which addresses the aforementioned 


inference problems. The MGMG split sampler is based on the following model setup for 
LGMs, to which we adhere to in this paper. 

Data-level: The observations y depend on the latent field x, through some choice of 
data distribution with a data density function 7r(y | x). 

Latent level: The prior for the latent field x is Gaussian and is potentially dependent 
on hyperparameters 6, with a density function 

7r{x \ 0) = J\f {x \ n{9), . 


Hyperparameter level: A prior distribution is assigned for the hyperparameters 6 , 
with a density function 7r(0). 

The principle idea behind the MGMG split sampler is to split the latent Gaussian param¬ 
eters X into two vectors, r] and u, where rj consists of elements that appear in the data 
density function and u consists of elements that do not appear in it. Thus, the data y be¬ 
come conditionally independent of {u, 0) conditioned on y, that is, n^y \ x,0) = n^y \ rj). 
For the posterior inference, all the model parameters are grouped into two blocks. That 
is, rj is placed in a block we refer to as the data-rich block in this paper, while both 
v and the hyperparameters 6 are placed in another block referred to as the data-poor 
block. A Gibbs sampling strategy is then implemented for each block, conditioned on 
the other block. 

In many practical applications with non-Gaussian data density functions, especially 
in the field of spatial statistics, the vector y in the data-rich block has a complicated 
but low-dimensional conditional posterior structure, while the parameter vector v in the 
data-poor block is often of much higher dimension than that of the parameters in the 
data-poor block [Hrafnkelsson et al. 2012 . Therefore, by using the proposed blocking 
scheme the potentially computationally demanding conditional posterior density in the 
data-rich block contains a minimum number of necessary parameters. Furthermore, the 
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conditional posterior density of v becomes conditionally Gaussian conditioned on the 
parameter vector rj and the hyperparameters 6. This designed structure, that is, the 
minimal dimension of the parameters in the data-rich block and the conditional Gaussian 
posterior structure within the data-poor block, is exploited to implement computationally 
efficient sampling schemes within the blocks. Further, the proposed scheme scales well 
when the dimension of v increases, as discussed in the ensuing paragraph. 

The MGMG split sampler is modular by design such that, in principle, any efficient 
MGMG sampler can be implemented for each block. In this paper, we propose computa¬ 
tionally efficient sampling strategies that are tailored to the particular conditional model 
structure of each block. Within the data-rich block we present a strategy based on the 
gradient and curvature information of the target density that results in an independence 
proposal mechanism as discussed in Rue and Held [2005 . Gonditional independence re¬ 
sulting from the model structure within the block can be utilized in some cases to increase 
acceptance in the Metropolis-Hastings algorithm [Metropolis et ah' 1953, Hastings 1970| . 
In order to update the data-poor block, a modified version of the fast single block updater 
of Knorr-Held and Rue |2002| is proposed, which exploits the fact that the conditional 
posterior of the vector u is Gaussian conditioned on r} and 6. This step is invariant of 
the choice of a data density function as the the data-poor block is updated conditioned 
on the data-rich block. Moreover, if the latent field a; is a GMRF with a sparse precision 
structure, the sampling strategy for the data-poor block is shown to conserve the sparse 
GMRF precision structure, allowing for fast sampling of the corresponding GMRF. 

The paper is organized as follows. Section 2.1 and Section 2.2 are devoted to the 
motivation, introduction and the setup of the MGMG split sampler. The proposed sam¬ 
pling schemes within the data-rich and data-poor blocks are presented in Sections 2.3 and 
Section 2.4, respectively. Examples on the implementation of the MGMG split sampler 
are given in Section 3. In Section 3.1 we present a LGM with a latent spatial model 
structure on mean and log-variance parameters, and show how the MGMG split sampler 
scales well as the dimensions of the latent parameters in the data-poor block increase. 
An example on extremes based on a simulated data set is given in Section 3.2, where 
the focus is on a LGM where latent models are imposed on all three parameters of the 
generalized extreme value distribution. In Appendix A, we give an extension to the sam¬ 
pling scheme proposed in Section 2.3, which is applicable if conditional independence 
assumptions are imposed on the data data density function. Lastly, in Appendix B, we 
show the necessary proofs for the main results in the paper. 


2 The MCMC split sampler 

2.1 Motivation and model setup 

Gonsider, as a motivation and without loss of generality, a data density function 7r(y | 
/X, t) where /x and r are vectors of location and log-scale parameters, respectively, which 
are modeled with latent Gaussian fields. That is, assume the following additive model 
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structure 


fi — “t" T — X'j-f3j- “t" A-j-Uy 

where and X^ are fixed design matrices; and are the corresponding weights; 

and A^- are hxed matrices; and and it,- are structured random effects. In order to 
increase computational stability in the posterior inference calculations and for mathemat¬ 
ical derivations of the MCMC split sampler, we introduce unstructured random effects, 
and e,-, to the model. That is. 


H — X^(3^ -|- A^u^ -|- e^, T — X-j-fij- T A^u-j- -|- e^- 


( 2 . 1 ) 


Small variances can be imposed a priori on the unstructured random effects and er 
if they are not desired in the model. However, adding the unstructured random effects 
is reasonable in many cases from a statistical modeling point of view as they serve as 
error terms for the latent models. Furthermore, adding the and terms to the latent 
models yields an analogous latent model structure as implied by the structured additive 
regression in Fahrmeir et al. |1994| where only the mean is linked to a structured additive 
predictor through a link function. 

Assign the following Gaussian prior density functions to the latent model parameters 




= I AA(/3r I Mr/3, QJ) 

= M{U^ I fl Q-^), 7:{Ur) = M{Ur \ Q~^) 


fifin') fiu 

= I 0, Q~^), 7r{er) = AA(e^ | 0, Q 


( 2 . 2 ) 


where parameters of the prior density functions can potentially depend of a set of hyper¬ 


parameters 6, and Q ^ and are diagonal matrices. 


-1 


■ tie 


As the vector pt in equation (2.1) is a linear combination of /3^, and 


e^, it is 


equivalent to obtain MCMC samples from the posterior distribution of (pi, and 

from the posterior distribution of (/3^, e^). Analogous argument holds for the log- 

scale parameters. The MCMC split sampler is designed to obtain MCMC samples from 
the posterior distribution of (/.t, r, /3^,it,-) as opposed to (/3^,/3.,-, it^, ii,-, e^, e^) 
as in the former parameterization only the vector (pi, r) enters the data density function, 
while all the elements of the latter vector enter the data density function in the latter 
parameterization. This parameterization for posterior inference is along the lines of the 
posterior inference scheme proposed in Rue et ^ |2009|. Thus, define 


^ = (M,'r)'^, iz = (/3^,u^,/3^,Ur)'^ 

which will act as the splitting of the parameters of the latent field. 


The latent model structure in (2.1) and the prior distributions in (2.2) can be written 
in a joint matrix form, which forms the basis for the derivation of the MCMC split 
sampler. Define the following matrices and vectors 


Z = / 




e = ' 


Qe = 


- I 


Qr 
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and group the following parameters and matrices together 






• • \ 

= 


, Qu = 

Q fLU 

Qrp 




V • 

■ QrJ 


where the dotted entries denote zero entries. The additive model structure implied by 


( 2.11 for both latent parameters is thus equivalent to the matrix form 


r] = Zu + e 

and the Gaussian prior assumptions in (|2.2|) are equivalent to 


TT 


{r]\u)=N[r]\ Zu,Q^ , 
'k{u)=N[u\ . 


(2.3) 


(2.4) 


As the data density function and the corresponding parameters were arbitrarily cho¬ 
sen above, analogous derivations can be carried out for any parametric data density 
function and any of its parameters. For example, in addition to imposing latent Gaus¬ 
sian models on the location and log-scale parameters of the generalized extreme value 
distribution a latent Gaussian model can also be imposed on the shape parameter, see 


Section 3.2 for details. Therefore, equations (2.3) and (2.4) are general in the sense that 


most of LGMs used in practice can be expressed in the same form. We will thus adapt 


equations (2.3) and (2.4) as a general setup for the latent model structures for LGMs 


henceforth in this paper. 

The following lemma, based on known results, plays a vital role in the implementation 
of the MGMG split sampler. 


Lemma 1. Assume the distribution assumptions given in (2-4), for any mean vector 
fixed matrix Z, and precision matrices and Q^. The joint prior density function of 
{rj, u) is then Gaussian of the form 


TT 


= M 



fj-u 


Qe -QeZ 

—Z'^Q^ Qy -\- Z~^Qi^Z 


and the conditional density function of v conditioned on rj becomes 


-1 

v\rj 


TT{v\'n)=N [v + Z Q^-n), Q 


(2.5) 


( 2 . 6 ) 


where Qy\^ = Qv + Z^Q^Z. 


See Appendix B.l for proof. Note that, as the vector ( 77 , v) is jointly Gaussian it can 


be viewed as the latent Gaussian vector x in the LGM setup. 
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2.2 The sampling scheme 


The vector rj in (2.3) consists of the parameters of the latent field that explicitly enter 


the likelihood function while the vector ly consists of the parameters of the latent field 
which do not enter it. Therefore, the data vector y is conditionally independent of v 
conditioned on y, that is TT{y | 77 , i/) = TT{y \ y). The parameters y and u are referred to 
as the data-rich and data-poor components of the latent field, respectively, in this paper. 
The corresponding posterior distribution, where the data-poor components of the latent 
held are potentially dependent on a vector of hyperparameters 6, is thus proportional to 


7r{y,u,6 I y) oc n^y \ 77 ) 77 ( 77 , 1 / | 6)7r{6). 


(2.7) 


Using the relationship in (2.7), we propose the following two block MCMC sampling 
scheme to obtain MCMC samples from the posterior density 77(77, i/, 0 I v )- The vector 
77 is placed in the data-rich block, and the vectors u and 6 are grouped together in the 
data-poor block. The MCMC split sampler obtains a sample from the posterior density 
77(77, ^ I ?/) by sampling from one of the blocks conditioned on the other in a Gibbs 

sampling setting. That is, the [k + l)-th MCMC sample from the posterior density 
77(77, ^ I y) is obtained by using the following two block Gibbs sampling scheme 

Data-rich block: sample y^~^^ from 77(77 I 

Data-poor block: sample 0 ^+^) jointly from 77(7/, 0 | y,y^^^) 

This scheme forms the basis of the MCMC split sampler. The potentially involved but 
often low-dimensional structure of the data-rich block is separated from the parameters 
in the data-poor block. By separating the two blocks, MCMC sampling strategies which 
exploit the conditional model structures can be implemented within each block in order 
to increase computational efficiency. Although any computationally efficient MCMC 
samplers are applicable within the blocks, we propose the following sampling schemes 
which are tailored for the conditional model structures of the blocks. The details of the 


proposed samplers for each block are summarized in Section 2.3 and Section 2.4 


2.3 Sampler for the data-rich block 

The conditional posterior density function 77(77 I Ui fbe data-rich block is in¬ 

tractable in most applications. In order to obtain MCMC samples from the conditional 
posterior density function we propose the following Metropolis-Hasting type MCMC 

[20^ . 


algorithm with a tailored independence proposal density Rue and Held 


To construct a computationally efficient independence proposal density, we approx¬ 
imate the conditional posterior density 77(77 I 0! with a Gaussian approximation 
evaluated at the mode of conditional posterior density. Using the logarithm of the con¬ 
ditional posterior, that is 


log 77(77 I y, u, 0) = f{y) - -y^Q^y -h {Q^Zu^y -F const 
where f{y) = log 77(7/ | 77 ) for notational convenience, the following can be shown. 


( 2 . 8 ) 
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Theorem 2. The Gaussian approximation of the conditional posterior density 7r{r] \ y, v, 0) 
is given by 


I y, ^) = AA (77 I 77 °, (Q, - H) 


-I'l 


(2.9) 


where rf is the mode of the conditional posterior density 7r{r] \ 77 , u, 6) and H is the the 
Hessian of the logarithm of conditional posterior evaluated at the mode, H = V^/( 77 *^). 


See Appendix |B.1| for proof. Note that, adding the additive nnstrnctnred error term 
e to the model in ( 2 . 2 ) prevents the the precision matrix in (2.9) from being singnlar and 


thns ensnres nnmerical stability. 


As the Ganssian approximation in (2.9) is constrncted at the conditional posterior 


mode 77 ^, a proposal density q for 77 based on (2.9) thns becomes invariant of the cnrrent 


position of 77 in the MCMC iteration. Therefore, the proposal density q is an indepen¬ 
dence proposal density [Chib and Greenberg 1995 Rne and Held[[2005| . That is, in the 
(k + l)-th iteration the proposal density is invariant of 77 ^, that is q{r]* \ 77 ^) = q{r]*). 

When a new rj* is proposed with the independence proposal density in ( |2.9| ) in the 
(fc -|- l)-th iteration, it is accepted with probability 


a = min < 1 , 


77 ( 77 * I _ g(77^) 

'K{r]^ \y,u,G) q{r]*) 


( 2 . 10 ) 


The logarithm of the ratio in (2.10) can be simplified, as stated in Lemmaj^ in order 
to rednce compntational cost. 

Lemma 3. Assume the proposal density q implied by the Gaussian approximation in 


(2.9) for the data-rich block. The logarithm of the acceptance ratio given in (2.10) can 


be simplified to 

r =fiv*) - 77 * - f{r]^) + H + j y" 

where b = V f{rf) — HTp. 


( 2 . 11 ) 


See Appendix B.2 for proof. As the gradient ^f{y) and Hessian H have already 


been calcnlated to obtain (2.9), the expression in ( |2.11 ) is compntationally efficient to 
calcnlate. 

In many applications conditional independence assnmptions are imposed on the data 
density fnnction. That is, there exists a partition of 77 into snbvectors 77 ^, snch that 7r{y \ 
v) = Hi I yj) which is tnrn implies f{'r}) = where fi is the logarithm 

of the marginal data density fnnction in the i-th partition. In some cases, a proposal 


density based on the Ganssian approximation in (2.9) can be a poor approximation of 

Updating the whole vector 
As a resnlt the 


the conditional posterior density in some partition of 77 . 

77 in one block may then resnlt in the MGMG chain getting stnck. 
compntational efficiency of the sampler is rednced. In order to circnmvent this issne and 


to retain the compntational speed gained by nsing the Ganssian approximation in (2.9) 
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as a proposal density, a modification can be made to the sampling scheme which utilizes 
the conditional independence of the partitions within the data-rich block. The details on 
the modification can be seen in Appendix [A| The resulting sampling scheme is outlined 
in Algorithm 1. Note that by choosing I = 1 in Algorithm 1, the above sampling scheme 
without the conditional independence assumptions on the likelihood is obtained, while 
selecting / > 2 in Algorithm 1 assumes the aforementioned partitioning of rj and that 
each is accepted or rejected separately. 


Algorithm 1 The proposed algorithm for obtaining the {k + l)-th sample from 
7r{r]\y,i/,0) in the data-rich block. By choosing 1=1, the sampling scheme intro¬ 
duced in Section |2.3| is obtained. For / > 2 the modified sampling scheme, which is 
derived in Appendix [A] is obtained for the partitions. 

Input: 

1 : Find the mode rf = argmax \og'K{r}\y,u^,6) 

V 

2 : Calculate H = f{rf) and h = V f{rf) — Hrf 
3: Sample rj* ^ J\f {Q^ — H)~^^ 

4: Calculate p{r}^) and where 



and o denotes an entrywise product 
5: for i = 1,..., I 

6: Calculate n = fi{r]*) + pip*)] 1 - {fiiVi) + p{v’')J l) 

7: Calculate a* = min{l,exprj} 

8: Sample Ui ~ 1) 

9: if ai > Ui 

10 : = y* 

11 : else if ai < Ui 

12 : = y’l 

13: end if 

14: end for 
Output: 


2.4 Sampler for the data-poor block 

The parameters (i^, 0) in the data-poor block are, by construction, conditionally inde¬ 
pendent of y conditioned on the vector y from the data-rich block, that is, 

\y,v)= vr(iz,0 | y). 


9 






The conditional posterior density function of the data-poor block is therefore invariant 
of the choice of likelihood function and proportional to 


7r(i>',0 I y,?7) oc \ r],6)7r{6) 


( 2 . 12 ) 


where the conditional density function 'it{u \ rj, 6) is a Gaussian density of the form given 


in (2.4) are GMRFs with sparse precision structures then the Gaussian density function 


in equation (2.6). Moreover, if the Gaussian density functions in the prior assumptions 


in (2.6) retains the sparse GMRF structure induced by the prior assumption, by known 
results about conditioning of GRMFs |Rue and Held 2005 . Fast sampling algorithms for 


GMRFs can thus be implemented to obtain samples from the Gaussian density function 
in (2.6), as discussed in Rue [2001 


The relation in ( 2 . 12 ) and the Gaussianity of 'k{v \ rj, 0) in (2.6) motivate the following 
Metropolis-Hasti ngs based sampling a l gorith m, which is a modified version of the one 

For some proposal density q{6* \ 6^) for 
is generated jointly as follows: 


block sampler of Knorr-Held and Rue |2002 


the hyperparameters 0, a new proposed value 


e* ~ q{e* I e^) 

V* r^TT{v* I 


Denote the proposal density implied by (2.13) with q{u*,9* \ 
value is then accepted jointly with acceptance probability 


(2.13) 


The proposed 


a = min < 1 . 


7r(i/*,0* I y,T7^+^) q{v^,e^ I 


(2.14) 


In Lemma 1^ we show how the acceptance ratio in (2.14) can be simplified, which is 
modified version of the results shown in Knorr-Held and Rue [ 200 ^ . 


Lemma 4. Assume the proposal density implied by (2.13) for the data-poor bloek, and 
denote the proposal density with q{i'*,0* \ The corresponding acceptance ratio 


in ( 2 . 14 ), can be simplified to 


7T{u*,e* I y, r/^+i) g(i/^, 6'^ \ u*,e*) 7 r( 0 * | r/'^+i) q{e'^ \ 0 


fe+i^ 


7r(i/^ 0^ I y, T 7 *^+i) q{u*,e* \ 0^) 7r(0^ | 77 ^+ 1 ) q{e* \ 6^) 

and is therefore independent of the value ofv. 


(2.15) 


In other words, the acceptance ratio in (2.15) is only dependent on the acceptance 
ratio for 0. Further, since the conditional posterior ti{u \ y, 0) is a known Gaussian 
the proposed sampling strategy scales well in terms of computational efficiency as the 
dimensions of the data-poor component of the latent field u increases. 


When the Gaussian models in the prior assumptions (2.4) are GMRF density func¬ 


tions with a sparse precision structure, the ratio in (2.15) is computationally costly to 
calculate directly, since 7 r (0 | 77 ) oc 77 ( 0 ) 77(77 I I does not necessarily pre¬ 


serve the sparse GMRF structure. However, as the ratio in (2.15) is only dependent on 


the acceptance ratio for 0 it can be shown that the ratio in (2.15) can be rewritten in 


order to preserve the sparse GMRF precision structure, as stated in the Theorem 


10 



















































Theorem 5. The term tt{6 * \ 77^+^)/7r(0^ | ) in p.iJp can be rewritten as 

7r(0* I ^yfc+i) ^(0*) 7r{r]^+^ \0,e*)Tr{0\e*) 7r(0 | 77 ^+^, 0^) 


7r(0^ I T7^+i) 7r(0^) 


7r(0 I r7^+i,0* 


^(77^+1 I O, 0 '=) 7 r(O I 0^) 


(2.16) 


Additionally, the conditional density functions on the right hand side in (2.16) on a 
logarithmic scale are 


log 7r(r7|0, 0) = ^ log det Q, - + const 

log7r(O|0) = - log det Qi/+ const 
log 7r(0|r7, 0) = ^ log det + Z'^Q^Z^ + 

Qv + Z'^Q^z') Z'^Q^rj] Z^Q^rj + const 


(2.17) 


Moreover, if the Gaussian prior density functions in ( 2-4) are GMRFs with sparse pre¬ 


cision structures, then all of the conditional density functions on the right hand side of 


(2.16) are GMRFs with sparse precision structures. 


Theorem 1^ shows how the ratio in (2.14) can be calculated with low computational 
cost by using the results in (2.15), (2.16) and (2.17) in case of GMRFs with sparse 


precision structures. This is a key result for the implementation of the proposed sampling 
scheme in the data-poor block for GMRFs with sparse precision structures. The algorithm 
for the sampling scheme in the data-poor block is is summarized in Algorithm 2. 


3 Examples 

Two examples are presented in this section where the MGMG split sampler is applied to 
obtain posterior samples from the proposed models. In the former example, a data set 
on annual mean precipitation in Iceland is modeled with a LGM that has a spatial model 
structure at the latent level. The latter example is on extreme flood events. 

We will emphasize that the aim of this section is to present some of the possibilities 
offered by the MGMG split sampler rather than to claim which model is the best for 
each data set. The main purpose of the first example is to demonstrate that the MGMG 
split sampler is well suited to infer LGMs with a spatial models on both location and 
scale parameters of the data density function, and that the computational efficiency of 
the sampler scales well as the number of unobserved spatial grid points increases. The 
main goal of the latter example is show that the MGMG split sampler is designed to infer 
LGMs with a non-Gaussian three parameter data density function, where all the three 
parameters are modeled with latent Gaussian models. 
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Algorithm 2 The proposed algorithm for obtaining the {k + l)-th sample from 
7 r(i/, 0 |y, 77 ) in the data-poor block. 

Input: 77^+1) 

1 : Sample each element of 6 * from a proposal density q{6* \ 0^) 

2 : Calculate 


7r(r) 7r(77^+i|0,r)7r(0|r) 7r(0|77^+i, 0^) q{e^ \ 6*) 

7r(0^) 7r(O|77fc+i,0*) 7r(r7fc+i|0,0^)7r(O|0^) ' q{e* \ 6^) 


3 

4 

5 

6 
7 


on a logarithmic scale, using the equations in 
densities functions 
Calculate a = min{l,r} 

Sample 77 ~ ^(0,1) 
if a > 77 

Calcualte = Qv + Z^Q^Z 
Sample u* from 


(|2.17) for the conditional posterior 


i/*|77^+\0* 

8 : (7/^+\0^+^) = 

9: else if a < 77 

10 : {u^+^,e’^+^) = 

11: end if 

Output: (7/^+1,0^+1) 


rsj 
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3.1 Annual mean precipitation in Iceland 


The data set analyzed in this section is on observations on annual precipitation from 
86 observational sites across Iceland, see Figure over the years 1962 to 2006. Times 
series on annual precipitation from the observational sites Reykjavik, Aldey, Akureyri and 
Kvlsker are shown in Figure The data was provided by the Icelandic Meteorological 


Office (IMO). 

A LGM with a SPDE spatial model structure [Lindgren et al.| 2011 at the latent 
level is presented to obtain the spatially varying distributional properties of annual pre¬ 
cipitation over the domain. We will demonstrate that the computational efficiency of the 
MCMC split sampler scales well as the number of grid points in the mesh in the SPDE 
approach increase. 



-23 -21 -19 -17 -15 

Longitude 


Stations 
Reykjavik 
^ TEdey 
^ Akureyri 
A Kvisker 


Figure 1: The / = 86 observational sites in Iceland. Reykjavik is marked with red, A16ey 
is marked with blue, Akureyri is marked with green and Kvlsker is marked with purple. 


Model setup 

The data level: The data were modeled with a LGM assuming the Gaussian distri¬ 
bution for the observations and conditional independence over the observational sites. 
That is, let ya denote the annual precipitation at observational site i at year t then the 
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Year 


Figure 2: Times series over the years 1962 to 2006 on annual precipitation. The time 
series are based on observations from Reykjavik (red curve), dUSey (blue curve), Akueyri 
(green curve) and Kvisker (purple curve). 


data density function becomes 

AA(?/jt I/Uj,exp(ri)), i t = 

where I is the number of sites, T is the number of years; Hi and r* are mean and log- 
variance parameters, respectively, which are both allowed to vary spatially. 


The latent level: The following model structure was implemented for the mean pa¬ 
rameter /X = (/ii,..., m) at the latent level of the model. 


/X — X-\- e 


-/i? 


where Xfj, is a design matrix consisting of a vector of ones and covariates that are based 
on the meteorological model of Crochet et al. |2007| , see Geirsson et al. [2015 for details; 
(3^ are the corresponding weights; denotes a Matern type spatial held constructed 
with the SPDE approach [Lindgren et al. 2011 on a triangulated mesh over the spatial 
domain with a smoothness parameters chosen as one, which corresponds to an almost 
once differentiable Matern held and a = 2 in the SPDE method; is a known projection 
matrix; the matrix product A^tx^ then denotes the spatial effect at the observational 
sites, which captures the spatial variation in the data that is unexplained by the covariate 
and is an unstructured random effect. Analogous model structure is also implemented 
for the log-variance parameter, that is 


T = X-rf3^ + AsUr + Cr- 

where Xj- is a design matrix consisting of a vector of ones and the aforementioned 
meteorological covariate on a logarithmic scale. 
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Working within the LGM setup, the following prior density functions were assigned 
to parameters at the latent level of the model. 

I T^iPr) = I 0 , 

7r('U^) = N{u^, I 0, Q~l), 7r{Ur) = N{Ur I 0, Q~^), 

7r(e^) =M{e^ \ 0,a^^J), 7r(e^) = AA(e^ | 

The parameter values = 0.0025 and KjSr = 0.25 were fixed in the prior distributions 
for /3^ and f3^. The precision matrices and Qu^ are constructed with SPDE ap¬ 
proach, and have sparse GMRF precision structures. Further, the precision matrix 
has two parameters, au/i and Ku^, which serve as hyperparameters of the spatial model 
for fi. The hyperparameters and are related to the marginal variance and range 
of the spatial field, respectively. Analogous structure holds for The parameters 
and a‘1^ are unknown variance parameters for the unstructured random effects. 

The hyper level: Let 6 denote all the hyper parameters of the model that are not 
fixed, that is 

^ CT-uT) ^uti ^et)- 

Lognormal prior distributions with fixed parameters were assigned to the hyperparmeters 
in e. 


Posterior inference 


In order to apply the MGMG split sampler to the aforementioned model, the model 
parameters are assigned to the data-rich block which includes rj = and the data- 

poor block which consists of f3^,Ur) and the hyperparameters 9. 

The aforementioned model setup and prior assumptions are equivalent to the setup 


implied in equations (2.3) and (2.4) with 




A, 


Qu = 




\ 


Q 


Ufl 


Q, = 

\ 




QurJ 


a-H 


(3.1) 


Data-rich block: The conditional posterior 'K{r} \ y, u, 9) in the data-rich block is 

intractable. However, the logarithm of the conditional posterior of the data-rich block is 


of the same form as in equation (2.8), with Z and defined in equation (3.1) and 


/(^) = = logA/'(?/it|//j,expri), 


(3.2) 


i=l 


i=l tGAi 
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where the set Ai contains the indices of the years t observed at site i. By model as¬ 
sumptions, the vectors r/j = become conditionally independent in the conditional 

posterior 7r(77 | y,u,0) over observational sites i. This demonstrates that the modifi¬ 
cation of the sampling scheme in Section 2.3 outlined in Appendix is applicable. 


Therefore Algorithm 1 was used to obtain MCMC samples from the conditional poste¬ 
rior TT{rj I y,u,6) with I = 86, as in (3.1) and f{r]) as in (3.2). 


Data-poor block: In order to implement the sampling strategy outlined in Section [2^ 
and to obtain MCMC samples from the conditional posterior 'k{u,6 \ y,r]), a proposal 
density q for the hyperparameters 6 must be chosen. In this example, the proposal 
strategy suggested in |Knorr-HeId and Rue 


2002 is used for each element of 6 . That is, 


let 0 * = fOf where the scaling factor / has the density 

7r(/)ocl-M// for/e[l/F,T] 


(3.3) 


where T > 1 is a tuning parameter. Knorr-Held and Rue| |2002| show that this is a 


symmetric proposal density in the sense that q{0*\9^) = q{9^\9*). Therefore, by using 


this proposal density, the acceptance probability in equation (2.14) simplifies to 


a = min < 1 


tt{0 * I rj 


fc+i'i 


7r(0” I 


Moreover, the ratio in (2.16) in Theoremj^was used to calculate the acceptance probabil¬ 
ity, which preserves the sparse GMRF precision structure induced by the SPDE approach. 
Thus, Algorithm 2 is was implemented to obtain MCMC samples from the conditional 
posterior 7 r{r] \ y, u, 0), with Z, and defined in (3.1) and the proposal density in 


(3.3). 


Convergence diagnostics 

The following convergence diagnostics are based on four MCMC chains sampled in par¬ 
allel with the MCMC split sampler from the proposed model. Each chain was calculated 
with 50000 iterations where 10000 iterations were burned in. The posterior inference was 
carried out separately for three different mesh resolutions. That is, a coarse resolution 
based on 411 mesh points; a medium resolution based on 858 mesh points; and a dense 
resolution based 1752 mesh points. In Figure]^ the three different meshes are presented 
on a same scale. The top, middle and bottom panels in Figure]^ show the coarse res¬ 
olution, medium resolution and dense resolution meshes, respectively. Runtime, on a 
modern desktop (Ivy Bridge Intel Core i7-3770K, 16GB RAM and a solid state hard 
drive), was approximately 6, 6.5 and 7 hours for the coarse, medium and dense mesh 
resolution, respectively. All calculations were carried out using R. 

Gelman-Rubin plots, based on the MCMC runs, of the the mean parameter y, in 
Reykjavik; the covariate coefficient /3^2] and the marginal standard deviation for the 
spatial field <7^^ are shown in the first, second and third column, respectively, in Figure 
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Figure 3: The triangulated meshes over the spatial domain, based on the coarse mesh 
(top left), medium mesh (top right) and dense mesh (bottom). 


The results based on the coarse resolution, medium resolution and dense resolution 
meshes for the aforementioned parameters are shown in the first, second and third row, 
respectively, in Figure]^ A comparison of the results in Figurebetween the different 
mesh resolutions reveals that the convergence in the mean is achieved for the the three 
parameters at a similar rate. Furthermore, the Gelman-Rubin plots in Figure]^ show 
that the sampler has converged in the mean after roughly 7500 iterations for all mesh 
resolutions. Similar results hold for all of the other model parameters (results not shown). 

Autocorrelation plots for the same set of parameters and arranged identically as in 
Figure are shown in Figure The results demonstrate that the MCMC chains for 
the mean parameter /i in Reykjavik and the covariate coefficient /3^2 exhibit a negligible 
autocorrelation after lag 10. The MCMC samples of the hyperparameter au^ show auto¬ 
correlation around 0.3 at lag 50. Similar results hold for all the other model parameters 
(results not shown). 

Relying on the Celman-Rubin statistics and the autocorrelation plots, the MCMC 
chains exhibit all signs of having converged. Moreover, the autocorrelation plots in Figure 
[5] reveal that the autocorrelation in the MCMC chains does not increase with number 
of mesh points, which in turn indicates that the autocorrelation in the MCMC chains 
is invariant of the dimensions of the data-poor part of the latent field u. These results 
demonstrate that the MCMC split sampler retains its computational efficiency when the 
number of mesh points increases, which is to be expected as the acceptance probability 
in (2.15) in the data-poor block in independent of the u. 
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Furthermore, as the acceptance probability in (2.15) within the data-poor block is 
only dependent on the hyperparameters, the autocorrelation seen in MCMC chains for 
the hyperparameter au^ in Figure]^ is mainly affected by the choice of proposal density 
for 0, which is in this example the sampler in (3.3). In Section 3.2 we will demonstrate 
the modularity of the MCMC split sampler, by choosing another proposal density for 
the hyperparmeters 6 in Algorithm 2 which significantly reduces the autocorrelation in 
MCMC chains for 0. 
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Figure 4: Gelman-Rubin plots for fj,, /3^2) and aufi for three different mesh resolutions. 
The red solid line denotes the median of the Gelman-Rubin statistics, and the green 
dashed line denoted the upper limit of the 95% confidence interval for the Gelman- 
Rubin statistics. The first row is based on a coarse resolution (411 mesh points). The 
second row is based on a medium resolution (858 mesh points). The third row is based 
on a dense resolution (1752 mesh points). The results demonstrate the MGMG split 
sampler has converged in the mean after roughly 7500 iterations. 


3.2 Flood analyzis 

In this section, we present a simulation study on extreme events. The data set consists of 
simulations of monthly maximum instantaneous flow based on characteristics of ten river 
catchments around Iceland. The characterizing features that were used to simulate the 
data for each river were chosen as river catchment area and maximum daily precipitation, 
as both river catchment area and maximum daily precipitation are known to be positively 
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Figure 5: Autocorrelation plots for j3^2, and for three different mesh sizes. The 
first row is based on a coarse resolution (411 mesh points). The second row is based on a 
medium resolution (858 mesh points). The third row is based on a dense resolution (1752 
mesh points). The results demonstrate that the autocorrelation in the MCMC chains 
decays rapidly and in invariant of the number of points in the mesh. 


correlated with maximum instantaneous flow, see Davidsson and |Crochet et al. 

The simulated time series were chosen to be 150 years. 


Model setup 

The data level: The data were modeled with a LGM assuming the generalized extreme 
value distribution (g.e.v.) for the observations. To that extend, let ymj,t denote the value 
from river j at month m and year t, with a cumulative density function of the form 

= exp (l + U, 

if 1 + ^mjix — ymj)/^i > 0) F{Vit) = 0 otherwise. The parameters /imj, CTmj and are 
the location, scale and shape parameters of the g.e.v. distribution for river j in month m. 
Additionally, J is the number of rivers and T is the number of years. Further, the data 
is assumed independent between both rivers and between months. These assumptions 
were made for demonstrative purposes. 
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The latent level: The location and scale parameters are modeled on a logarithmic 
scale at the latent level, which is modelling setup along the lines presented in |Cunnan^ 


and Nash [1971 and [GREHY |1996|. Thus, define Xmj = log/^mj and Tmj = ^oga. 


mj- 


The shape parameter is modeled on its native scale. 

As discussed in DaviSsson (2^, the underlying processes of monthly maximum 
instantaneous flow exhibit a seasonal behavior. Therefore, the following seasonal model 
is proposed for the location parameter on a logarithmic scale. That is. 


Xmj — /3o,A T '^ 0 ,m,X T ^l,mj{Pl,X T m,x) 

+ . . . + Xp^m,jiPp,X T '^p,m,x) T ^mj,X (^'d) 

where /3 o,a denotes an overall intercept term; Xi^mj denotes the i-th covariate in month 
m at the j-th river; (3i^x denotes the weight of the i-th covariate for f = 1,... ,p; rio,m,A 
denotes the seasonal random effect of the m-th month; Ui^m,x denotes the seasonal addi¬ 
tional weight of the i-th covariate within month m; and emj,x denotes an unstructured 
random effect. 

In order to write the model in a matrix form for the implementation of the MCMC 
split sampler, combine the location parameters for river j over months. That is, 

• • • > j = 1, • • •, 


and define the following 

"^ijA ~ ('*^1,1,A) • • • ) ^i,12,A) ) -^i,j ~ diag(Xjji, . . . , Xiji2'), 

where i = 0,... ,p and xo,jm = 1 denotes the intercept term for river j and month m. 
Additionally, define 


/l XI 




jd 


odd \ 


1 Xij^2 ■ ■ ■ Xp^i^2 

Vl ■■■ Xp^i^uJ 


, Aj — [Aqj, ..., Apj) . 


The seasonal model presented in (3.4) for the log-location parameter for river j can be 
written in matrix form as 


\j — Xj(3x + AjUx €j^x 

where (3x = {l3o,x, ■ ■ ■, Pp,xV, ux = {uq^x, ■ ■ ■ ,Up^xV, and ej^x = {eij,x, ■ ■ ■ Ai 2 j,x)- By 
combing the seasonal model over rivers, the following holds 


where 


A — XPx + Xux + ex 


A = 

/Ai\ 

,X = 


,A = 

(AA 


( ei,A\ 


Udj 


Udj 


Udj 


Vej,A/ 
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Analogous model structure was also implemented for the log-scale parameter. That is, 


T = XI3^ + Aur + e^. 

A reduced model with a similar structure was implemented for the shape parameter 
That is 


^mj — /^O,^ T T ^mj,^ (^•^) 

where /3o,.c denotes an overall intercept term; denotes the seasonal random effect 

of the m-th month; and emj,^ denotes an unstructured random effect. The full matrix 
model for ^ becomes 


$ = li 2 J/do,^ + (Ij ® -^12) 

where denoted an n-dimensional vector of ones. 

Working within the LGM framework, the following prior density functions were as¬ 
signed to the latent parameters. First assign. 


I 0, 7r(/3^) = \ 0, a}^I) 7r{P^) = \ 0, 

The parameters and are assumed a priori to have a low precision on their 

native scales in order to let the data play the dominate role in their inference. Thus, 
the parameter values = 4, apr = 4 and = 2 were chosen for the prior density 
functions. 

Secondly, the selection of prior density functions for the seasonal random effects 
needs to incorporate a correlation structure that induces a strong correlation between 
neighbouring months. This is achieved by assigning the following prior density functions 

7r(tXA) = Miux I 0, diag(i/>;^) (g) 

7r(ur) = J\f{ur \ 0, diag(i/>^) 0 

'k{u^) =M{u^ I 

where i/’a = (V’o,A) • • • > 'ipp,x)'^, V’r = (V’o,t, • ■ •, V'p.t)'*' and serve as scaling parameters 
for the monthly random effects corresponding to the three intercepts and the covariates; 
and the is a 12 x 12 circular band precision matrix that has the vector 

[1 - 2 (^ 2 + 2 ) + + e -2 {k‘^ + 2) 1 ] 

on the diagonal band, as discussed in ||Lindgren et ah which capture the autocor¬ 

relation between months. In this example, the decay parameters was fixed to simplify 
the inference and set equal to k = 1. Further, this value of k induces an autocorrelation 
a priori between consecutive months. Third, for the unstructured random effects, the 
following priors were chosen. 

7r(eA) = AA(eA I 0,cr^A^)) 7r(e.r) = AA(e^ | 0, 7r(eg) = A/'(eg | 0, 
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The hyper level: Let 6 denote all the hyperparameters of the model that are not fixed 
on a logarithmic scale for computational purposes. That is, 

0 = ( log V’O.A, • ■ • , log V’p,A, log V' 0 ,r, • • • , log , log , log CT^x, log O'er > log 

Gaussian prior distributions with fixed parameters were assigned to the hyperparmeters 

in e. 

Posterior inference 

The data-rich block includes rj = (/x, r, and the data-poor block consists of ix = 
f3^,UT-, /3^,u^) and the hyperparameters 0. For the implementations of the 
MCMC split sampler, define the following sparse matrices 


J 

fx A ■■ ■ ■ \ 

L 1 

1 

b 


■ ■ X A . 

,a = 


1 

\ • • • • ll2J Il2jJ 

' 1 



and 

= hdiag{a^ll, diag(t/j;,) 0 diag(V>^) ® (3.7) 

where bdiag denotes a block diagonal matrix. 


Data-rich block: The modified version of the sampling scheme in Section 2.3 outlined 
in Appendix [A| was used to obtain MCMC samples from the conditional posterior 7r{r] \ 
The logarithm of the conditional posterior is of the same form as in equation 
( |2.8| ), with Z and defined in equation (3.6) and 

12 J 12 J T 

fiv) = (3.8) 

m=l j=l m=l j=l t=l 

where vTgev denotes the density function of the generalized extreme value distribution. 
Therefore, Algorithm 1 was used to obtain MCMC samples from the conditional poste¬ 
rior from the data-rich block, with / = J • 12 = 120, as in (3.6) and f{ri) as in (3.8). 


Data-poor block: The sampling scheme outlined in Section 2.4 was used to obtain 
MCMC samples from the conditional posterior 7r(ix,0 | y,ri) in the data-poor block. A 
proposal density based on the normal distribution centered on the last draw of 0, as 
discussed in Roberts et al. 1997 , was selected for Algorithm 2, with a precision matrix 


—cH where H is a finite difference estimate of the Hessian matrix of log 7r(0|f)) evaluated 
at the mode. That is, 


H ^V^log7r{e\ri)L 


e=0o 


(3.9) 
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where i) is the maximum likelihood estimate of r] for each river and month; Oq is the 
mode of Iog7r(log0|f7); and c is a scaling constant. Conditioning on f), as opposed of 
^k+i £qj. example, removes the necessity to estimate H in every iteration. Moreover, 
setting a specific scaling u removes the need for tuning. The scaling c = 2.382/dim(0) 
was implemented, as it is optimal in a particular large dimension scenario, see IRoberts] 
The resulting proposal density therefore becomes 


et al. 1997 


q{e*\e^) =Af(e* \ 


-1 


(3.10) 


Algorithm 2 was thus implemented to obtain MCMC sampled from the conditional pos¬ 
terior within the data-poor block, with as in equation (3.6); Qy as in equation 


(3.7); and the proposal density in (3.10). 


Convergence diagnostics 

As in Section [3.1[ the following convergence diagnostics are based on four MCMC chains 
sampled in parallel with the MCMC split sampler. Each chain was calculated with 50000 
iterations where 10000 iterations were burned in. Runtime on the same desktop as in 
Section [3.1 1 was approximately 7 hours. 

The left panel in Figure [^compares the empirical cumulative distribution from river 
j = 1 in January with its posterior cumulative distribution functions based on the MCMC 
runs. The right panel shows the corresponding probability - probability plots. These 
result indicate that the model describes the data well. Which in turn demonstrates that 
the MCMC split sampler recaptures the known underlying model, which was used to 
generate the simulated data. Analogous results hold across all rivers and months (results 
not shown). 

Furthermore, as the data was generated from a known model setup, the results of the 
inference based on the MCMC runs can be compared to the known values of the model 
parameters. In Figure the known values of the seasonal random effects are shown 
along with the corresponding 95% posterior intervals. The top panel in Figure shows 
this comparison for the seasonal random effect Uo,m,A for the log-location parameter A 
as a function of months. The middle and the bottom panels in Figure show the same 
comparison for Uo,m,T for the log-scale parameter and for the shape parameter 

respectively. The results reveal that the 95% posterior intervals for the seasonal random 
effects contain their known values. These results demonstrate that the MCMC split 
sampler recaptures the known seasonal random effects. Similar results hold for all other 
model parameters (results now shown). 

Gelman-Rubin plots and auto-correlation plots for nine model parameters based on 
the MCMC run are shown in Figures [^ and respectively. Both plots are based on the 
same set of parameters and arranged identically. Three parameters were chosen from the 
location, scale and shape structures of the proposed model which were placed in the first, 
second and third rows of Figures and [^ respectively. The first columns are based on 
parameters from the data-rich part of the latent field; the second columns are based on 
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parameters from the data-poor part of the latent field; and the third column is based on 
hyperparamters. 

The Gelman-Rubin plots in Figure show that the sampler has converged in the 
mean after roughly 10.000 iteration. Similar results hold for all the model parameters 
(results not shown). Furthermore, the autocorrelation plots in Figure [^demonstrate that 
the MCMC chains for the parameters from both the data-rich and the data-poor parts 
of the latent field, exhibit a negligible autocorrelation after lag 10. The hyperparameters 
show a negligible autocorrelation after lag 30. Relying on these results, the MCMC chains 
exhibit all signs of having converged. Moreover, these results further indicate that the 
MCMC split sampler, with the modified proposal density of Roberts et al. 1 11997| implied 
by equation (3.9) for the hyperparameters, is highly computationally efficient in both the 
data-rich and data-poor blocks. 



14 16 18 

Sorted Observations 



Figure 6: The left panel shows the empirical cumulative distribution of maximum in¬ 
stantaneous flow from river J = 1 in January (black solid curve) and the posterior mean 
of the corresponding posterior cumulative distribution functions (blue solid curve) and 
corresponding 95% posterior interval (blue dashed curve). The right panel shows a 
probability-probability plot of maximum instantaneous flow from river j = 1 in January, 
along with 95% posterior intervals. 


4 Discussion 

The main advantage (or novelty) of the MCMC split sampler lies in how the proposed 
blocking scheme leads to conditional posterior structures within in the two blocks, which 
can be exploited in order to construct computationally efficient sampling schemes for each 
block. Additionally, the MCMC split sampler is, in principle, designed as a modular 
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Figure 7: The top panel in shows the known value (denoted with the blue entries) of the 
seasonal random effect no,m,A for the log-location parameter A as function of months m. 
The middle panel shows the known value (denoted with the green entries) of the seasonal 
random effect no,m,T for log-scale parameter r. The bottom panel shows the known value 
(denoted with the red entries) of the seasonal random effect no,m,.j for the shape parameter 
The errors bars in all panels represent the corresponding 95% posterior intervals based 
on the MCMC-runs. 


sampling scheme in the sense that any MCMC sampling scheme can be implemented 
within the blocks. Therefore, in the authors’ view, the MCMC split sampler presents 
an interesting area of future research as new sampling schemes for either block can be 
developed independently of the other block. 

In the data-rich block, we proposed a Metropolis-Hastings algorithm with an in¬ 
dependence proposal density which was constructed with a Gaussian approximation of 
the conditional posterior density evaluated at its mode. Furthermore, we proposed a 
modification the sampler which is applicable if conditional independence assumptions 
are imposed on the data density function. The modification can potentially increase the 
computational efficiency of the sampler, as discussed in Appendix [A| 

Although the proposed sampler in the data-rich block is computationally efficient, it 
is only applicable in practice if the mode of conditional posterior density function can be 
found, and can be calculated reasonably fast. For example, in the case of models where 
each observed data point has more than one unique data density parameters associated 
with it, say of the type yi ~ '^iyi\yi,o'i) for every measurement i, finding the mode of 
the conditional posterior 7r(/rj, ai\yi) becomes computationally impractical in some cases. 
Models of this type include, for example, certain spatial temporal models [Hrafnkelsson 


et al. 2012 . Similar computational issues also arise if data dependence at the data-level of 


a LGM is desired, see for example Davison et al. [2012 where t-copulas are implemented 
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Figure 8: The figure shows Gelman-Rubin plots based on the MCMC run. The first, 
second and third rows in the first column are based on a randomly chosen log-location 
parameter A; covariate coefficient 13 and hyperparameter i/jx, respectively. The second 
and third row show an analogous set of parameters based on the log-scale and shape 
structures, respectively, of the likelihood. 
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with g.e.v. marginal density functions at the data level as a model for spatial extremes. 
However, in both of the aforementioned cases different sampling scheme for the data-rich 
block can be implemented without changing the sampling scheme of choice in the data- 
poor block. For example, sampling scheme based on MALA and HMC type algorithm 
are well suited for the structure of the data-rich block in both cases. 

In the data-poor block, the conditional posterior density TT{v'\ri,6) is a Gaussian of 
the form (2.6) and invariant of the data density function. These results serves as one of 
the main computational advantage introduced by the MGMG split sampler due to the 
following reasons. First, as the conditional posterior 7r(u\r],0) is Gaussian, a modified 
version of the one block sampler of Knorr-Held and Rue |2002|, wh ich is known to be 
a highly efficient sampling scheme when applicable [Filippone et ah' 2013| , becomes ap¬ 
plicable within the data-poor block regardless of the data-density function at the data 
level. As a consequence, computationally efficient sampling algorithms can be used to 
sample from the exact conditional posterior Gaussian density 7r(i/'|?7, 0 ). Further, if the 
prior density functions in (2.4) have a sparse GMRF precision structure, then 7r{iy\r],6) 
preserves the sparse structure as discussed in Section 2.4 which in turn allows for highly 
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Figure 9: The figure shows auto-correlation plots based on the MCMC run. The first, 
second and third rows in the first column are based on a randomly chosen log-location 
parameter A; covariate coefficient 13 and hyperparameter i/jx, respectively. The second 
and third row show an analogous set of parameters based on the log-scale and shape 
structures, respectively, of the likelihood. 


efficient sampling algorithms for the Gaussian density 7r(i.'|77, 0). Therefore, the proposed 
sampling scheme in Section 2.4 for the data-poor blocks scales well in terms of computa¬ 
tional speed and efficiency with increasing dimensions of the data-poor part of the latent 
field, which is of great importance to achieve, especially in the field of spatial statistics. 

Second, as the conditional posterior density Tr{iy\r],6) is a known Gaussian and the 
acceptance rate in the sampling scheme for the data-poor block in Section |2.4| is only 
dependent on hyperparameters, the computational efficiency of the proposed sampling 
scheme for the data-poor block is only dependent of the sampling scheme used for the 
hy p er par amet er s. 
ular, 


In this sense, the sampling scheme in Section 2.4 is in itself mod- 

Ghoosing a 


that is, any proposal density for the hyperparmeters is applicable, 
computationally efficient sampling scheme for the hyperparameters can thus increase the 
computational efficiency of the overall sampling scheme within the data-poor block, as 
demonstrated in the examples in Section That is, in Section 3.1 we demonstrated 
how a proposal density implied by equation ( |3.3[ ) may be implemented due to its sim¬ 
plicity. However, in Section 3.2 we proposed a modified version of the sampling scheme 
of Roberts et al. |1997| implied by equation (3.9), which reduces the autocorrelation in 
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the MCMC chains. In practical terms, the proposal density implied by (3.9) can also be 


implemented Section [3. H which in turn reduces the autocorrelation in the MCMC chains 
for the hyperparameters (results omitted). 

Due to the modularity of the MCMC split sampler, sampling schemes for the data- 
rich block can be developed and improved independently of the sampler in the data- 
poor block, and vice versa. Additionally, as the conditional posterior density 7r{iy\r],6) 
becomes invariant of the data in the data-poor block, the computational advantages 
introduced by the conditional posterior structure in the data-poor block hold for all 
LGMs. Moreover, the MCMC split sampler can be applied to various LGMs as it is 
designed to handle LGMs where latent models Gaussian models are imposed on more 
than just the mean structure of the data density function. Thus, in our view, further 
developing and improving sampling schemes that utilize the computational advantages 
introduced by the MCMC spilt sampler presents an interesting area of future research. 
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Appendices 


A Conditionally independent data density functions 

In many applications of LGMs, conditional independence assumptions are imposed on 
the likelihood function, see for example the discussion in Rue et al. [2009 . That is, by 
the model assumptions there exists a partition of rj into subvectors rj^, i = 1,... ,1, such 
that 


-^{y I v) = I 


(A.l) 


i=l 


where Tri{y^\ri^) denotes the marginal data density functions of the i-th partition. The 


conditional independence assumptions in (A.l) imply /(t?) = ^iftiVi), where fi is the 


logarithm of the marginal data density function 7ri{y^\r]j_). As discussed in Section 2.3 
the Gaussian approximation in (2.9) can be, in some scenarios, a poor approximation 


of the conditional posterior density in some partition i, which in turn may cause the 
sampler to get stuck and thus lose its efficiency. To address this issue, we suggest a 
modified version of the sampler proposed in Section [2.3| which retains the computational 
speed gained by using a Gaussian approximation as a proposal density and is applicable if 
conditional independence assumptions are imposed on the data density function. Before 


we introduce the modifications to the sampling scheme in Section 2.3 we give a few 
essential technical results for the modifications, which are as follows. 


Lemma 6. Assuming the conditional independence assumptions in (A.l). Then, every 
pair of vectors and such that i i', become conditionally independent in the 
posterior given [y,v,6). In other words, the following relation holds 


T^iVi I y, v-i, t', s) = I y, 


(A.2) 


for all partitions i. Furthermore, the conditional posterior density of is independent 
of y_i, which denotes the subvector of the data vector y that is not observed in partition 


31 









i. That is, the following relationship holds 


I = TT{r]i I yi,u,e) 


(A.3) 


for all partitions i 


For notational simplicity and based on the relation in (A.2) and (A.3), we denote 
the conditional posterior density of r|^ with TTi{r]^ \ y,u,0) henceforth. The following 
corollary yields a useful identiy for the conditional posterior with TTi{ri^ \ y, u, 6). 


Corollary 7. Assume the conditional independence assumptions in (A.l). The logaritym 
of the conditional posterior density \ y, v, 6) is 

logvri(r7i I Vi + K (A.4) 

where K is a constant. 

The subsequent corollary is the immediate from Theorem]^ Lemmaj^and the relation 
in (A.4). 


Corollary 8. Assuming the conditional independence assumptions in (A.l), the Gaus¬ 
sian approximation of the conditional posterior \ y,u,6) within each partition i 

is 


(A.5) 


\y,d,u)=M ( r/i I rji, (Q, - 


where rfl denotes the mode of the marginal posterior density function T^i{'r]i \ y, u, 6) and 
{Q^ — iT)(j j) denotes the submatrix of {Q^ — H) belonging to partition i. 

The next corollary shows the relation between the Gaussian approximation density 
functions in (2.9) and (A.5). 


Corollary 9. Assume the conditional independence assumptions in (A.l). Furthermore, 
let 7r{r) \ y, v, 0) denote the Gaussian approximation of the conditional posterior density 
7r{r] I y,u,6), and TTi{rj^ \ y,u,6) denote the Gaussian approximation of the conditional 
posterior \ y, v, 6) within partition i. Then the following relation holds 


n{ri I y,u,e) = ]Jifi(?7i I y,i^,^)- 


(A.6) 


The modifications to the sampler proposed in Section |2.3| are based on the following 
observations. From the conditional posterior independence relation in (A.2) in Lemma|^ 
follows directly 

I y, v-i, 0) = T^iVi I y, Q) 

for all partitions i. It is therefore equivalent to update rif\y,r}_j^,u,0 iteratively over 
partitions i, with a Gibbs sampling approach using (A.5) as a proposal density and 
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to update ri^\y,v,6 separately over partitions i. By updating separately as opposed to 
iteratively, the number of functions calls is reduced, which in turn reduces computational 
cost. 

However, in practical terms it is faster to compute the mode once by computing the 
maximum of log Tr{r]\y, v, 6) than computing the maximum of log T:i{rij\y, u, 6) separately 
in every partition i. This is due to the fact that number of function calls increases as I 
increases in the numerical optimizing methods when finding the mode of log TTi{ri^\y, ^,6) 
within every partition i. Furthermore, the computational cost of calculating the function 
log7r(T7 I y,u,6) in (2.8) is minimal and scales well as the dimension of the data-poor 
block increases, as the is a diagonal matrix and Z is a fixed sparse matrix. Thus, 
calculating the gradient and the Hessian matrix of the conditional posterior is also com¬ 
putationally fesiable in many cases. 


The relation in (A.6) demonstrates that it is equivalent to propose a new vector rj* 


from the normal approximation n{r] \ y, u, 0) and to propose new vectors rj* separately 
from \ y, i', 6) for every i. Therefore, to reduce computational cost and to increase 
computational efficiency, we propose the following modifications the the sampling scheme 


in Section 2.3 That is, propose a new vector y* from q{rf) = 7f(r/ | y,u,0) and accept 


and reject y* within each partition separately with the probability 


Q!j = min < 1 


I y, I 

TT{y*\y,v,e) / ff(77f I y, I/, 0) 


(A.7) 


The acceptance ratio in (A.7) can be calculated separately over partitions i with a low 


computational cost, as demonstrated in the following corollary. 


Corollary 10. Assume the conditional independence assumptions in (A.l) and adopt 


the same notation as in Lemma The logarithm of the acceptance ratio in {A.7), that 

I y,i^,0) /ixirfl I y,u,e) 


IS 


n = log 




where ^i{yi \ y,u,6) denotes the Gaussian approximation of the conditional posterior 
I within partition i. Then ri can he simplified to 

n = h{yl)+p{y*)Jl- Uiiv^) + 

for all i, where fi is the logarithm of the marginal data density function in partition i 
and 

T 


Piv) = [nV H + b 


y 


for notational simplicity, where o denotes an entrywise multiplication. The corresponding 
acceptance probability in partition i is thus 

ai = min {1, exp r^} . 

The sampling scheme for the data-rich block with the aforementioned modifications 
is summarised in Algorithm 2 
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B Proofs 

B.l Proof of Lemma [T] 


Proof. By know results about Gaussian distributions and inverses of block matrices, the 
joint distribution of {rj, v) is given by 


TT 




= N 




( Q, -Q,Z \ 

Q, + Z'^Q.Z) ) 

(Qf^ + ZQf^Z^ ZQ-^\\ 

V Q-^Z^ Q-i )) 


The conditional distribution u conditioned on r] follows directly from Lemma 2.1 in [Rue 


and Held 2005 , that is, 

7r(i/ \rj)=Af(u + Z'^Q^r)), 


where = Qu + Z^Qe^- 


□ 


B.2 Proof of Theorem 

Proof. The conditional posterior density function TT{r] \ y,i/,0) is proportional to the 
product of the data density function and the conditional Gaussian prior density 7r(77 | 
i',9) given by (2.4), that is 

7r(77 I y, u, 6) oc 7r(y | y)'K(rt \v',e). 


Thus, the logarithm of the conditional posterior density function is given by 

logvr(r 7 | y, u, 6) = f{y) - + {Q^Zu^y + const 

where f{y) = log7r(y | y) for notational convenience. The second order Taylor approxi¬ 
mation of f{y) expanded around the mode yf^ of the conditional posterior n^y \ y, u, 6) 
is 

fiv) ~ /(h°) + - V°) + - vPyH{y - J7°) 

= ^y^ Hy + (V/(? 7 °) — Hy^Yy + const. 

Gonsequently, the second order Taylor approximation of log7r(T7 | y,i',6) expanded 
around y^ becomes 

log7r(77 I y, u, 6) « ^y'^Hy + {Vf{yY - Hy^^y - ^y'^Q.y + {Q^Zu^y + const 
= {Qe - H)y + {Q^Zu + bYy + const, 
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where b = {Vf{r]^) — Hrf). This derivation yields a Gaussian approximation with a 
mean vector 

{Q,-H)-\Q,Zu + h) 

and covariance matrix {Qe-HT ^. However, as the vector is the mode of the condi¬ 
tional posterior function 7r(77 | y, u, 0) the following relation holds 

V log 7r(77° I y, ly, 6) = Vf{rf) - Q^rjo + = 0. 

The mean of the Gaussian approximations becomes 

iQ,-H)-\Q,Zty + b) 

= {Q, - H)-^ {Q,Zu + Vf{rf) - Hrj^) 

= (Q, - H)-^ {Q.r,^ - Hrj^) 

= (Q, - H)-^ {Q, - H)rj^ = 

Thus, a Gaussian approximation of the conditional posterior density function 7r{r] \ 
y, u, 6) evaluated at the mode is given by 

{v\ (Qe - Hy^). 

□ 


B.3 Proof of Lemma I 


Proof. The logarithm of the acceptance ratio given in (2.10) is 


r = log 


Trjri* I y,iy,e)qy 
7r(77^ I y,iy,e)q{r]* 


(B.l) 


where 7r(77 | y,u,6) is the conditional posterior density function given in (2.4) and giy) 
is the proposal density based on the Gaussian approximation in (2.9). The right hand 
side term in (B.l) can be written as 

log7r(r/* I y,u,e) - logq{y*) 

- (log 7r(?7^ I y, u, 6) + log 


Since the proposal density q is based on the Gaussian approximation in (2.9) the following 
holds 

log 7r(77 I y, u, 6) - log q{y) = f{y) - + {Q^Z^y 

- (^\yHy + b'^y - ^y'^Q.y + {Q.Zufy'^ + const 

= fiv) ~ \ \y Hy + b^yj -I- const 


which yields the results in (2.11). 


□ 
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B.4 Proof of Lemma [4] 


Proof. By definition of the proposal density in (2.13) the following holds 

qiu’^,0’^ I u*,e*) _ n{u’^ \ \ 6*) 

q{u*,e* I ~ 7r{u* \ r]’^+\e*)q{e* \ 6^)' 

where q{6*\6^) is some proposal density for 6 and 7r(^' | r/, 6) is the conditional Gaussian 
density function in (2.6) in Lemma[^ Therefore, the acceptance ratio in (2.14) can be 
written as 


7 r(i/*,r I y,r 7 ^+i) 7 r(i/^ | q { e ^ \ 6 *) _ 7 r( 6 »* | g( 6 »^ | 6 *) 

I | 77^+1,0*) q{e* \ 0 ^) ~ 7r(0^ | q{e* \ 6 ^) 


(B.2) 


since 'K{v,6\y,r]) = 7 r(i/, 0 |? 7 ), as discussed in Section 2.4, and tt(u,0 | r])l'K{y \ri,9) = 


7r{0 I 77 ) for any t],u and 0. The result in (B.2) demonstrates that the acceptance ratio 


in (2.14) is only dependant on 6 within in the proposed setup. In other words, the 
acceptance ratio in (2.14) becomes independent of the value of u. □ 


B.5 Proof of Theorem [5] 

Proof. In order to rewrite 7 r (0 | rj) in ( |2.15 ) we use the relation 

7 t {6 I 77) OC 77(0)77(77 I ^)- 

Further, by the law of conditional probability, the following holds 

77 ( 77 , 1 / I 6) 77(77 I i/,0)vr(i/ I 6) 


77(77 I 0) = 


77 ( 7 / I 77 , 0 ) 


77 ( 1 / I 77,0) 


(B.3) 


(B.4) 


As 77(77 I 0) is independent of the value of i/, it follows that the two ratios in (B.4) are 
invariant of the choice of 7/. In particular, the following holds 


77(77 I = 


77(77 I 0 , 0)77(0 I 0 ) 
77(0 I 77 , 0 ) 


(B.5) 


by choosing the value 7 / = 0. Combining (B.3) and (B.5) yields 


77(0* I 


77 ( 0 *) 77 ( 77 *=+l I e * 


77 ( 0 ^ I 77^+1) 77(0^)77(77^+1 I 0 ^) 


77 ( 0 * 


77(77^+1 1 o, 0 *) 77 (o I e*) 

X ^-rA— , , ( by —^ X 


77(0 I 77 ^+ 1 , 0 ^ 


77 


(0 I 77 *^+!, 0* 


77 ( 77^+1 I O,0^)77(O I 0^) 


Moreover, if the Gaussian prior density functions in (2.4) are GMRFs with sparse pre¬ 


cision structures, then all of the conditional density functions on the rightmost side of 


(B.4|) are GMRFs with sparse precision structures, by known results about conditioning 

□ 


on subvectors as demonstrated in Theorem 2.5 in Rue and Held 2005 
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B.6 Proof of Lemma [6] 

Proof. As the matrix is a diagonal matrix the following holds 

r{r] I u,e) =M {r]\Zu,Qf^) 

I 


TT 


2=1 


(B.6) 


where denotes the submatrix of belonging to partition i. Let 

which serves as the conditional prior density function for the data-rich part of the latent 
field belonging to partition i. The relation in (B.6) along with the conditional indepen¬ 
dence assumptions in (A.l) yield 

-K{r} I y, U, 0) oc 7r{y \ 'n)'K{r) \ i', 0) 

I 

= YlMy^ I Vi)'^iiViW,G) 

2=1 

which demonstrates that the vectors and are conditionally independent in the 
conditional posterior given [y, u, 0), for all i ^ i'. 

Furthermore, the conditional independence assumptions in (A.l) also yield 


I y, 0) oc 7r(r/i I Vi)-Kiiyi \ u, 6) 
oc I 

which demonstrates that 7r{r]^ \ y,u,0) is independent of y_j. 

B.7 Proof of Corollary 


(B.7) 

□ 


Proof. The conditional independence assumptions in (A.l), the relation in (B.6) and the 
relation in (B.7[) yield 


log7ri(?7i I yi,u,e) = log7r(yi | +log7ri{r]i \u,e) + K 

1 
2 


= fiiVi) - lvjQe,ii,i)Vi + Vi+K 


for every partition i. 


□ 


B.8 Proof of Corollary 

Proof. The result follows by using Theorem]^ on the log7rj(?7j | y,u,6) given in (A.4) 
instead of log7r(?7|y, i/, 0). □ 
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B.9 Proof of Corollary 


Proof. The relation in ( |A.2[ ) and ( |A.3 ) in Lemma (|^ implies that the elements of mode 
77 ® belonging to partition i are also the mode of every partition i. The 

results then follows from Lemma Theorem and Corollary □ 


B.IO Proof of Corollary |10| 

Proof. As conditional independence are imposed over partitions i, Lemma yields the 
following 


log7r(77 I y,6,u) - log 71(77 I = fiv) - Hr] - h'’’ 77 + const 


= “ 2 '^JH(i,i)Vi -bjriij + const. 

i ^ 

The results follows by similar derivations as in the proof of Lemma 


□ 
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